News

The Debate Over AI Training and Copyright: What You Need to Know

Article

The big picture: U.S. copyright law was designed with the flexibility to adapt to generations of technological advancements while maintaining its core purpose: protecting rights and spurring innovation. Artificial intelligence is no exception.

  • Why it matters: The fair use doctrine provides a durable yet adaptable framework. This doctrine facilitates the protection of individual rights and the need to foster innovation, allowing AI to leverage diverse content types — from written works to visual art — in ways that drive progress.

How it works: AI models process information by breaking it down into digital data points — similar to 1s and 0s. Large models include billions of these data points. During the training process, models identify patterns, relationships, and context across the training data.

  • Why it matters: AI models don’t store original works like a database. Instead, they capture statistical information on how words or images are used in correlation with each other, allowing individuals to use AI tools to accurately perform tasks such as generating written responses to inquiries or recognizing images.
  • A human parallel: Similar to how humans learn, AI models synthesize information from vast amounts of data to identify patterns and draw connections. For example, when writing or speaking, we don’t consciously recall a specific source for each idea. Instead, our knowledge builds over years of exposure to various forms of information, enabling us to process and create new content without referencing exact sources.
  • For example: When a model encounters terms like “kids,” “sandbox,” and “playground” appearing near each other at a high frequency, it becomes skilled at predicting that these terms are closely related. In contrast, terms like “sandbox,” “Mars,” and “monkey” appear together far less often. As a result, when given a prompt like “What do kids like to do at the playground,” the model is more likely to predict a response such as “Kids like to play in the sandbox at the playground,” reflecting the stronger connections between these words in the training data.

Analyzing visual data: When it comes to images, models can be trained on different visuals that share common characteristics. Later, when a model analyzes a photo by examining pixel patterns to detect shapes and colors, it can distinguish common objects like a “tree” or “house.” Additionally, it can differentiate a “beach” scene from a “mountain” landscape by recognizing key elements like sand, water and sky in the former, or rocky terrain and peaks in the latter.

Dive deeper

Under U.S. copyright law, creators of original works have exclusive rights, but the law also provides limitations and exceptions, including fair use. A key aspect of fair use is whether using material from a copyrighted work has a transformative purpose, meaning it leverages the existing work in a new and different way.

  • What it means: The use of materials in AI training qualifies as transformative when it doesn’t substitute for the original works and merely derives statistical data to perform different tasks.
  • The bottom line: U.S. copyright law provides the flexibility needed to address emerging technologies, and the courts are well-equipped to handle questions around AI and fair use.

How fair use promotes innovation

Throughout history, technological advancements have relied on fair use to permit copying of protected works in order to deliver transformative innovations.

  • For example: The Supreme Court ruled in 1908 that piano rolls — a new technology whereby player pianos automatically played music from perforated rolls of paper — allowed music to be consumed in a new, transformative way that didn’t infringe on sheet music copyrights.
  • More recently: Copyright-protected mapping data has allowed various smartphone apps to revolutionize turn-by-turn navigation, while the indexing of websites owned by various copyright owners enabled search engines to organize and make information accessible to billions.

The bottom line: As AI continues to evolve, the time-tested fair use doctrine can advance innovation and protect rights.